Metadata is typically a text record that describes the content of a digital asset such as an image and enables search and retrieval. Metadata also contains other information about an image. For example, if the digital asset is a picture, metadata may indicate the identity of individuals in the picture, when the picture was taken and/or where the picture was taken. While image capture, including digitization, can be mechanized, metadata generation in general has not been mechanized, and is an excessively expensive and time consuming operation.
Metadata accompanying a digital asset can be as significant a part of the package as the image pixels themselves. Metadata for digital assets and, in particular, images is either generated manually or by editing a previous metadata record. Some metadata fields are filled with reference to a controlled vocabulary or authority which enables a uniform and standardized practice for assigning names to people, places, etc. Other metadata fields, such as those associated with an abstract or object description, are free form. As such, an abstract can take from 2 hours to 2 days to fill out, including the research. The Library of Congress Prints and Photographs Division estimates cataloging time for a digitized image at fifteen to thirty minutes for a brief description and up to an hour for a detailed item-level record.
Table 1 (below) shows an excerpt of the record for an image, which is part of the collection of The Henry Ford Museum. The contents of some fields, such as the Subject fields, use terms drawn from a naming authority and follow agreed upon standards. Others, such as Abstract, are free form. As noted above, researching and filling out the Abstract can take from two hours to two days. Other metadata fields can also be expensive to generate and generally require human input. As a result, it would be beneficial to organizations with large collections of images to have technologies that reduce the amount of labor required to generate metadata. Assistance would be beneficial in filling in the subject fields, e.g. some subject fields have plural terms and some objects always have the same subject terms.
Title1913 Ford Model T Touring CarAbstractThis 1913 Model T carried on the tradition of low-cost,high-production vehicles Henry Ford established withthe 1909 Model T. The 1913 Model T included asignificant body redesign that became the iconic look ofthe car for the next 12 years.<snip>Object nameAutomobileMade data1913-02PhysicalFive passenger Model T Ford touring car with Brewsterdescriptiongreen metal body, black fenders, and running boards.Black leather top with side curtains. Black leathertufted seats. Folding windshield. Three doors. Taillight.Specifications:4 cylinder engine en bloc3.75″ bore, 4″ strokeSubject-Ford Motor CompanyCorporatenamesSubject-TopicalAssembly-line methods|Automobile industry|MasstermsproductionSubject-GenreAutomobiles|Ford automobile|Ford Model Ttermsautomobile|Touring cars
Table 2 (below) shows excerpts from the bibliographic information for a photograph from the collection of the Library of Congress.
TITLE[Willow Creek, Creede, Colorado]REPRODUCTIONLC-DIG-fsac-1a34855 (digital file from originalNUMBERtransparency)SUMMARYPhoto shows buildings along Willow Creek, withSnowshoe Mountain in the distance. (Source: FlickrCommons project, 2009)MEDIUM1 transparency: color.CREATED/1942 Dec.PUBLISHEDCREATORFeininger, Andreas, 1906-1999, photographer.NOTESTransfer from U.S. Office of War Information,1944.General information about the FSA/OWI ColorPhotographs is available athttp://hdl.loc.gov/loc.pnp/pp.fsacTitle devised by Library staff. Title from FSA orOWI agency caption misidentified the view as“Lead mine, Creede, Colo.”Additional information about this photograph mightbe available through the Flickr Commons project athttp://www.fiickr.com/photos/library_of_congress/21799 14560SUBJECTSWorld War, 1939-1945, Rivers, Mountains, UnitedStates--Colorado--CreedePART OFFarm Security Administration - Office of WarInformation Collection 12002-62
A known approach to saving time in preparing a bibliographic record is cloning: copying a record or starting with a template with the same medium type and/or from the same collection as the photograph being cataloged and then editing fields as needed. In the Library of Congress, it generally takes fifteen to thirty minutes for a brief description and up to an hour for a detailed item-level record. The record, and in particular, the Notes field can also change and grow over time. The Library of Congress uploads images to Flickr and monitors comments, sometimes updating their records based on the comments (after verification). The record in Table 2 is an example of this process.